National Repository of Grey Literature 5 records found  Search took 0.00 seconds. 
Information Extraction from Wikipedia
Valušek, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes. Several approaches with the use of machine learning will be presented. Furthermore, important features like date of birth in articles regarding people, or area in those about lakes, and many more, will be extracted. With the use of the system presented in this thesis, one can generate a well structured knowledge base, using a file with Wikipedia articles (called dump file) and a small training set containing a few well-classed articles. Such knowledge base can then be used for semantic enrichment of text. During this process a file with so called definition words is generated. Definition words are features extracted by natural text analysis, which could be used also in other ways than in this thesis. There is also a component that can determine, which articles were added, deleted or modified in between the creation of two different knowledge bases.
Important Entity Recognition in Web Page Text
Svítková, Veronika ; Hynek, Jiří (referee) ; Burget, Radek (advisor)
The aim of this thesis is training named entity recognition model on a dataset created using structured data. Datasets were created from the names of products and books extracted from structured data in JSON-LD and Microdata format. Structured data were extracted from e-shop and social cataloging websites by web scraping. Names were used as a dataset by themselves as well as webpage text with automatically annotated matches of the names. In total eight models in Czech language were trained for recognizing names of products and books using spaCy library. F-score results are up to 89.94 for products and up to 84.26 for books evaluated on a created testing dataset.
Semantic Analysis of Parish Record
Kaňkovský, Adam ; Zbořil, František (referee) ; Rozman, Jaroslav (advisor)
The aim of this work is to design and implement an application for semantic analysis of matrix records, which will take as input text obtained from a matrix scan. The extracted information will then be entered into the appropriate fields of the table.
Call Sign Detection and Recognition in VHF Communication
Dedič, Juraj ; Kocour, Martin (referee) ; Szőke, Igor (advisor)
This work explores the processing of data from air traffic communication in order to detect and recognize the~call signs it contains. Particularly it involves recognizing these call signs in human made and automated text transcripts of the communication between pilots and air traffic controllers. The thesis compares various ways of solving this and describes their problems. It implements a system for the identification of these call signs using a suitable technology based on large language models. One of the outputs of this work is a service that is able to distinguish the call signs, which enables indexation and sorting of this data in an efficient way.
Information Extraction from Wikipedia
Valušek, Ondřej ; Otrusina, Lubomír (referee) ; Smrž, Pavel (advisor)
This thesis deals with automatic type extraction in English Wikipedia articles and their attributes. Several approaches with the use of machine learning will be presented. Furthermore, important features like date of birth in articles regarding people, or area in those about lakes, and many more, will be extracted. With the use of the system presented in this thesis, one can generate a well structured knowledge base, using a file with Wikipedia articles (called dump file) and a small training set containing a few well-classed articles. Such knowledge base can then be used for semantic enrichment of text. During this process a file with so called definition words is generated. Definition words are features extracted by natural text analysis, which could be used also in other ways than in this thesis. There is also a component that can determine, which articles were added, deleted or modified in between the creation of two different knowledge bases.

Interested in being notified about new results for this query?
Subscribe to the RSS feed.